图神经网络(GNN)已证明其在各种应用中的表现出色。然而,其背后的工作机制仍然神秘。 GNN模型旨在学习图形结构数据的有效表示,该数据本质上与图形信号denoising(GSD)的原理相吻合。算法展开是一种“学习优化”技术的算法,由于其在构建高效和可解释的神经网络体系结构方面的前景,人们引起了人们的关注。在本文中,我们引入了基于GSD问题的截断优化算法(例如梯度下降和近端梯度下降)构建的一类展开网络。它们被证明与许多流行的GNN模型紧密相连,因为这些GNN中的正向传播实际上是为特定GSD提供服务的展开网络。此外,可以将GNN模型的训练过程视为解决了较低级别的GSD问题的双重优化问题。这种连接带来了GNN的新景,因为我们可以尝试从GSD对应物中理解它们的实际功能,并且还可以激励设计新的GNN模型。基于算法展开的观点,一种名为UGDGNN的表达模型,即展开的梯度下降GNN,进一步提出了继承具有吸引力的理论属性的。七个基准数据集上的大量数值模拟表明,UGDGNN可以比最新模型实现卓越或竞争性的性能。
translated by 谷歌翻译
近年来,基于深度学习的面部检测算法取得了长足的进步。这些算法通常可以分为两类,即诸如更快的R-CNN和像Yolo这样的单阶段检测器之类的两个阶段检测器。由于准确性和速度之间的平衡更好,因此在许多应用中广泛使用了一阶段探测器。在本文中,我们提出了一个基于一阶段检测器Yolov5的实时面部检测器,名为Yolo-Facev2。我们设计一个称为RFE的接收场增强模块,以增强小面的接受场,并使用NWD损失来弥补IOU对微小物体的位置偏差的敏感性。对于面部阻塞,我们提出了一个名为Seam的注意模块,并引入了排斥损失以解决它。此外,我们使用重量函数幻灯片来解决简单和硬样品之间的不平衡,并使用有效的接收场的信息来设计锚。宽面数据集上的实验结果表明,在所有简单,中和硬子集中都可以找到我们的面部检测器及其变体的表现及其变体。源代码https://github.com/krasjet-yu/yolo-facev2
translated by 谷歌翻译
作为遗传和生理方面之间的桥梁,动物行为分析是生物学和生态学研究中最重要的主题之一。但是,识别,跟踪和记录动物行为是需要专业知识的劳动密集型作品。为了减轻注释数据的支出,研究人员转向用于自动标签算法的计算机视觉技术,因为大多数数据都是视觉记录的。在这项工作中,我们探讨了各种行为检测算法,涵盖了传统的视觉方法,统计方法和深度学习方法。这项工作的目的是对相关工作进行彻底的研究,为生物学家提供有效的动物行为检测方法。除此之外,我们还讨论了这些算法的优势和缺点,以为已经深入研究该领域的人们提供一些见解。
translated by 谷歌翻译
In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms face the well-known exploration-exploitation dilemma. The algorithm needs to choose points that yield information to improve model estimates but it also needs to exploit the model. We rigorously formulate the problem and propose a general information-directed sampling (IDS) algorithm. We provide theoretical guarantees for the performance of IDS in linear, graph and low-rank models. The benefits of IDS are shown in both simulation experiments and real-data experiments for discovering chemical reaction conditions.
translated by 谷歌翻译
课程学习(CL)是一种常用的机器学习培训策略。但是,我们仍然缺乏对CL的利益的明确理论上了解。在本文中,我们在结构化和非结构化设置下研究了CL在Multitask线性回归问题中的好处。对于两个设置,我们使用提供最佳课程和没有Oracle的Oracle获得最小的Minimax速率,而代理必须自适应地学习良好的课程。我们的结果表明,自适应学习可能比非结构化环境中的Oracle学习更难地更难,但它仅在结构化设置中引入了一个小额外的术语。为了将理论与实践连接,我们通过比较其保证与上述最小值率的保证来提供具有最高局部预测增益的任务的受欢情的经验方法。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译